Value Pursuit Iteration

نویسندگان

Amir Massoud Farahmand

Doina Precup

چکیده

Value Pursuit Iteration (VPI) is an approximate value iteration algorithm that finds a close to optimal policy for reinforcement learning problems with large state spaces. VPI has two main features: First, it is a nonparametric algorithm that finds a good sparse approximation of the optimal value function given a dictionary of features. The algorithm is almost insensitive to the number of irrelevant features. Second, after each iteration of VPI, the algorithm adds a set of functions based on the currently learned value function to the dictionary. This increases the representation power of the dictionary in a way that is directly relevant to the goal of having a good approximation of the optimal value function. We theoretically study VPI and provide a finite-sample error upper bound for it.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Dynamic Programming for One-sided Partially Observable Pursuit-evasion Games

We study two player pursuit-evasion games with concurrent moves, infinite horizon, and discounted rewards. The players have partial observability, however, the evader is given an advantage of knowing the current position of the units of the pursuer. We show that (1) value functions of this game depend only on the position of the pursuing units and the belief the pursuer has about the position o...

متن کامل

CONVERGENCE OF THE LINEARIZED BREGMAN ITERATION FOR l1-NORM MINIMIZATION

One of the key steps in compressed sensing is to solve the basis pursuit problem minu∈Rn{‖u‖1 : Au = f}. Bregman iteration was very successfully used to solve this problem in [40]. Also, a simple and fast iterative algorithm based on linearized Bregman iteration was proposed in [40], which is described in detail with numerical simulations in [35]. A convergence analysis of the smoothed version ...

متن کامل

Convergence of the linearized Bregman iteration for ℓ1-norm minimization

متن کامل

Policy iteration algorithm for zero-sum multichain stochastic games with mean payoff and perfect information

We consider zero-sum stochastic games with finite state and action spaces, perfect information, mean payoff criteria, without any irreducibility assumption on the Markov chains associated to strategies (multichain games). The value of such a game can be characterized by a system of nonlinear equations, involving the mean payoff vector and an auxiliary vector (relative value or bias). We develop...

متن کامل

On the statistics of matching pursuit angles

Matching Pursuit decompositions have been employed for signal coding. For this purpose, Matching Pursuit coefficients need to be quantized. However, their behavior has been shown to be chaotic in some cases; posing difficulties to their modeling and quantizer design. In this work, a different approach is presented. Instead of trying to model the statistics of Matching Pursuit coefficients, the ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2012

Value Pursuit Iteration

نویسندگان

چکیده

منابع مشابه

Dynamic Programming for One-sided Partially Observable Pursuit-evasion Games

CONVERGENCE OF THE LINEARIZED BREGMAN ITERATION FOR l1-NORM MINIMIZATION

Convergence of the linearized Bregman iteration for ℓ1-norm minimization

Policy iteration algorithm for zero-sum multichain stochastic games with mean payoff and perfect information

On the statistics of matching pursuit angles

عنوان ژورنال:

اشتراک گذاری